-
Notifications
You must be signed in to change notification settings - Fork 741
SOLR-17806: Migrate DirectUpdateHandler2 metrics to OTEL #3417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/SOLR-17458
Are you sure you want to change the base?
SOLR-17806: Migrate DirectUpdateHandler2 metrics to OTEL #3417
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dealing with rollbacks is annoying... it's tempting to ignore them but I suppose we should not.
I would like to explore here monotinically increasing numbers that can be subtracted from each other to determine "pending" if needed. Maybe the prometheus endpoint could have a convenience to compute pending as well.
Meter numErrorsCumulative; | ||
|
||
// Cumulative commands | ||
AttributedLongUpDownCounter deleteByQueryCommandsCumulative; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these "cumulative" things having a type that includes "Down"; wouldn't they never go down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They go down from a rollback command.
addCommandsCumulative.add(-addCommands.sumThenReset());
deleteByIdCommandsCumulative.add(-deleteByIdCommands.sumThenReset());
deleteByQueryCommandsCumulative.add(-deleteByQueryCommands.sumThenReset());
Although the maintenance operations below only go up, so we can move that to an normal counter actually.
updateStats = | ||
solrMetricsContext.observableLongGauge( | ||
"solr_metrics_core_update_pending_operations", | ||
"Operations pending commit. Values get reset after commit", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to instead track "submitted", as separate from "committed" operations? Much like the started and completed merges -- the PR Kevin Liang is working on -- #3416
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm maybe that is better. Maybe I'll reduce this to only docs_pending for this gauge instead then we'll move to submitted and committed increasing counters. I'll refactor this around and see how it looks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead an created submitted and committed and removed the pending metric. If you think this implementation is good, I'll fix up the tests accordingly for the metric. Here is what it looks like. WDTY?
# HELP solr_core_update_committed_ops_total Total number of committed update operations
# TYPE solr_core_update_committed_ops_total counter
solr_core_update_committed_ops_total{category="UPDATE",collection="gettingstarted",core="gettingstarted_shard1_replica_n1",ops="adds",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 1.0
solr_core_update_committed_ops_total{category="UPDATE",collection="gettingstarted",core="gettingstarted_shard1_replica_n1",ops="deletes_by_id",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 1.0
solr_core_update_committed_ops_total{category="UPDATE",collection="gettingstarted",core="gettingstarted_shard1_replica_n1",ops="deletes_by_query",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 1.0
# HELP solr_core_update_submitted_ops_total Total number of submitted update operations
# TYPE solr_core_update_submitted_ops_total counter
solr_core_update_submitted_ops_total{category="UPDATE",collection="gettingstarted",core="gettingstarted_shard1_replica_n1",ops="adds",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 1.0
solr_core_update_submitted_ops_total{category="UPDATE",collection="gettingstarted",core="gettingstarted_shard1_replica_n1",ops="deletes_by_id",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 1.0
solr_core_update_submitted_ops_total{category="UPDATE",collection="gettingstarted",core="gettingstarted_shard1_replica_n1",ops="deletes_by_query",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1"} 1.0
.getSolrMetricsContext() | ||
.getMetricManager() | ||
.getPrometheusMetricReader(solrCore.getCoreMetricManager().getRegistryName()); | ||
var actual = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this statement is lacking some elegance. If this pattern repeats itself; we should seek an improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I overhauled the util method a bit for getting metrics in SolrMetricTestUtils
. Maybe this is a bit better? If so, I will adopt this paradigm on the other SolrCore PR I did and other PRs going forward.
solr/core/src/test/org/apache/solr/metrics/SolrMetricTestUtils.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/metrics/SolrMetricTestUtils.java
Outdated
Show resolved
Hide resolved
assertEquals("new adds", 2, newAdds - adds); | ||
assertEquals("new cumulative adds", 2, newCumulativeAdds - cumulativeAdds); | ||
assertEquals( | ||
"Did not have the expected number of pending adds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you went through the effort of writng these messages -- fine. But I wouldn't waste your time; it's optional.
"Did not have the expected number of cumulative deletes_by_id", | ||
1, | ||
getGaugeOpDatapoint( | ||
reader, "solr_metrics_core_update_operations_cumulative", "deletes_by_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a quick observation -- the "metrics" part of these names seems redundant; no?
The "solr" part is I know needed to differentiate the JVM & Jetty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but mostly kept them as it was the way the prometheus exporter prefixed all it's metrics. Definitely no need to keep them if we don't want. I removed them in the latest commit but will remove it from the other metrics in a separate PR.
@@ -15,6 +15,8 @@ | |||
limitations under the License. | |||
*/ | |||
|
|||
//NOCOMMIT: This plugin seems tied to the Admin UIs plugin management but is tied to dropwizard metrics failing some tests. | |||
// This needs to change how it gets these metrics or we need to make a shim to the /admin/plugins handler for this to support it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at PluginInfoHandler, I suppose we can continue to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I wonder if "ops" would be sufficiently clear instead of "operations"? After all, we have "stats", not "statistics".
@@ -262,27 +262,29 @@ public void initializeMetrics( | |||
baseCommandsMetric, | |||
baseAttributes.get().put(OPERATION_ATTR, "deletes_by_query").build()); | |||
|
|||
var baseCommitMetric = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a "base"; why do you refer to it in the metrics for commands & merges below other than "commits"?
solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your improvements are a net improvement IMO; thanks for considering my feedback
"solr_core_update_docs_pending_commit", | ||
Labels.builder() | ||
.label("category", "UPDATE") | ||
.label("operation", "docs_pending") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'd be "ops" now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was doing a hasSameValues
instead of equals
which is why it was still passing. Should be fixed now.
.label("category", "UPDATE") | ||
.label("operation", "docs_pending") | ||
.build(), | ||
true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: booleans like here are inscrutable when reviewing code. Not a big deal though
getGaugeOpDatapoint(reader, "solr_core_update_operations_cumulative", "adds").getValue(), | ||
getGaugeOpDatapoint(core, "solr_core_update_cumulative_ops", "adds").getValue(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing core instead of reader is clearer; thanks
https://issues.apache.org/jira/browse/SOLR-17806
Migrate DirectUpdateHandler2 metrics and remove Dropwizard initializations.